testingiauh912fandomcom-20200214-history
Types of validity in early theory- Sakineh Yeganegi
Three Types of Validity in Early Theory Certainly, validity is the most important single characteristic of a test. If not valid, even a reliable test does not worth much. The reason is that a reliable test may not be valid; however a valid test is to some extent reliable as well. Validity is directly related to the content and form of the test. (SAMT, P:148) According to Brown (language assessment principle and classroom practices, (2004. P: 22) “validity defined as the most complex criterion of an effective test-and arguably the most important principle is validity, the extent to which inferences made from assessment results are appropriate, meaningful, and useful in terms of purpose of the assessment.” Therefore, according to Brown, there are five types of validity or evidence, which they are as follows: #Content-Related Evidence #Criterion-Related Evidence #Construct-Related Evidence #Consequential Validity #Face Validity On the other hand in Fundamental Considerations in Language Testing by Lyle F. Bachman '',P.236 ; Messik (1989), describes validity as ‘an interesting evaluate judgment of the degree to which empirical evidence and theoretical rational support the adequacy and appropriateness of inferences and actions based on test scores’. It has been traditional to classify validity into different types such as content, criterion, and construct validity. So, in comparison to the previous classification Consequential Validity and Face validity are not considered. In ''Language testing and assessment by Glen Fulcher and Fred Davidson validity consider in an early theory that was emerging after the Second World War and trace the changes that have occurred since then. As considered in this book there are three types of validity that each type was related to the kind of evidence that would count towards demonstrating that a test was valid. Here, Cornbach and Meehl (1955) described these types validity as: #Criterion- Oriented Validity Predictive Validity Concurrent Validity #Content Validity #Construct Validity 1. Criterion-oriented validity: Consider that the tester is interested in the relationship between a particular test and a criterion to which we wish to make predictions. Predictive validity: when the test scores are used to predict some future criterion, such as academic success called predictive Validity. Concurrent validity: If the scores are used to predict a criterion at the same time the test is given, called Concurrent validity. 2. Content validity: Content validity is defined as any attempt to show that the content of the test is a representative sample from the domain that is to be tested. 3. Construct validity: The first problem with construct validity is defining what a ‘construct’ is. Perhaps the easiest way to understand the term ‘construct’ is to think of the many abstract nouns that use on a daily basis, like; Anxiety, Intelligence,…. In general construct must have two properties: firstly, construct must be defined in such a way that it becomes measurable. Secondly, any construct should be defined in a way that it can have relationships with other constructs that are different. Also, there is another way that concepts become construct when they can become ‘operational’, they can measure them in a test of some kind by linking the term to something observable. Construct validity and truth In the early history of validity theory there was an assumption that there is such a thing as a ‘psychologically real construct’ that has an independent existence in the test taker, and that the test scores represent the degree of presence or absence of this very real property. As Cronbach and Meehl (1955: 284) put it: Construct validation takes place when an investigator believes that his instrument reflects a particular construct, to which are attached certain meanings. The proposed interpretation generates specific testable hypotheses, which are a means of confirming or disconfirming the claim. They assumed that their constructs actually existed in the heads of the test takers. Cronbach and Meehl (1955:248) , ‘make clear what something is” means to set forth the laws in which it occurs. It is refer to the interlocking system of laws which constitute a theory as a nomological network. The idea of a nomological network is not difficult to grasp. Firstly, it contains a number of constructs, and their names are abstract, like those in the list above. In language teaching and testing, ‘fluency’ and ‘accuracy’ are two well-known constructs. Secondly, the nomological network contains the observable variables – those things that we can see and measure directly, whereas we cannot see ‘fluency’ and ‘accuracy’ directly. In testing and assessment this meant that if there is no possible way to test the hypotheses created by the relationship between observable variables, observable variables and constructs, and between constructs, the theory is meaningless, or not ‘scientifically admissible’. Construct definition lies at the center of testing and assessment, any validity study is the investigation of the intended meaning and interpretation of test scores. as Messick (1989: 26) puts it (using the term ‘instrumentalist’ for ‘pragmatist’): ‘According to the instrumentalist theory of truth, a statement is true if it is useful in directing inquiry or guiding action.’ Messick (1989: 23) also added from a post-positivistic era that: Nomological networks are viewed as an illuminating way of speaking systematically about the role of constructs in psychological theory and measurement, but not as the only way. The nomological framework offers a useful guide for disciplined thinking about the process of validation but cannot serve as the prescriptive validation model to the exclusion of other approaches. Peirce believed that one day, at some point so far into the future that no one can see it, all researchers would come to a ‘final conclusion’ that is the ''truth, and to which our present truths approximate. Validity theory occupies an uncomfortable philosophical space in which the relationship between theory and evidence is sometimes unclear and messy, because theory is always evolving, and new evidence is continually collected. '''Cutting the validity cake' According to Cronbach and Meehl, the study of validity has become one of the central enterprises in psychological, educational and language testing. Messick (1989: 20) wrote: Traditional ways of cutting and combining evidence of validity, as we have seen, have led to three major categories of evidence: content-related, criterion-related, and construct-related. However, because content- and criterion-related evidence contribute to score meaning, they have come to be recognized as aspects of construct validity. In a sense, then, this leaves only one category, namely, construct-related evidence. Messick set out to produce a ‘unified validity framework’, in which different types of evidence contribute in their own way to our understanding of construct validity. Messick fundamentally changed the way in which we understand validity. He described validity as: an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. (Messick, 1989: 13) Therefore,’ validity’ is not a property of a test or assessment but the degree to which We are justified in making an inference to a construct from a test score. Messick’s way of looking at validity has become the accepted paradigm in psychological, educational and language testing. This can be seen in the evolution of the Standards for Educational and Psychological Testing. In the Technical Recommendations (APA, 1954) the ‘four types’ of validity were described, and by 1966 these had become the ‘three types’ of content, criterion and construct validity. The 1974 edition kept the same categorization, but claimed that they were closely related. In 1985 the categories were abandoned and the unitary interpretation became explicit: Validity is the most important consideration in test evaluation. The concept refers to the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores. Test validation is the process of accumulating evidence to support such inferences. A variety of inferences may be made from scores produced by a given test, and there are many ways of accumulating evidence to support any particular inference. Validity, however, is a unitary concept. Although evidence may be accumulated in many ways, validity always refers to the degree to which that evidence supports the inferences that are made from the score. The inferences regarding specific uses of a test are validated, not the test itself. Test usefulness Bachman and palmer (1996:18) have used the term ‘usefulness’ as a super ordinate in place of construct validity, to include reliability that is the consistency ''of test scores across ''facets of the test. Construct validity, authenticity, defined as the relationship between test task characteristics, and the characteristics of tasks in the real world. Interactiveness is the degree to which the individual test taker’s characteristics (language ability, background knowledge and motivations) are engaged when taking a test. Practicality is concerned with test implementation rather than the meaning of test scores. The validity cline Chapelle has characterized three current approaches to validity. Which they are express as follows: 1. Trait theory: ‘trait’ there is no different from the notion of a ‘construct’. It is assumed that the construct to be tested is an attribute of the test taker. The test taker’s knowledge and processes are assumed to be stable and real, and the test is designed to measure these. Score meaning is therefore established on the basis of correspondence between the score and the actuality of the construct in the test taker. 2. Behaviorist approach: the test score test score is mostly affected by context, such as physical setting, topic and participants. These are typically called ‘facets’ in the language testing literature. In ‘real world’ communication there is always a context – a place where the communication typically takes place, a subject, and people who talk. Behaviorist approach is typified in the work of Tarone (1998), in which it is argued that performance on test tasks varies (within individuals) by task and features or facets of the task. She argues that the idea of a ‘stable competence’ is untenable, and that ‘variable capability’ is the only defensible position. In other words, there are no constructs that really exist within individuals. Rather, our abilities are variable, and change from one situation to another. And according to Fulcher (1995) and Fulcher and Márquez Reiter (2003) have shown that in a behaviourist approach, each test would be a test of performance in the specific situation defined in the facets of the test situation. ‘Validity’would be the degree to which it could be shown that there is a correspondence between the real-world facets and the test facets, and score meaning could only be generalized to corresponding real world tasks. Therefore these two theories are very different in how they understand score meaning and we can understand this in terms of the concept of ‘generalizability’. 3.Pragmatic approach: in language testing there is no such thing as an ‘absolute’ answer to the validity question. The role of the language tester is to collect evidence to support test use and interpretation that a larger community – the stakeholders (students, testers, teachers and society) – accept. But this truth may change as new evidence comes to light .As James (1907: 88) put it, ‘truth happens ''to an idea’ through a process, and ‘its validity is the process of its valid-''ation’ (Italics in the original). Peirce Peirce (undated: 4–5) has suggested that the kinds of arguments we construct in language testing may be evaluated through abduction, or what he later called retroduction. He explains that retroduction is: the process in which the mind goes over all the facts of the case, absorbs them, digests them, sleeps over them, assimilates them, dreams of them, and finally is prompted to deliver them in a form, which, if it adds som ething to them, does so not only because the addition serves to render intelligible what without it, is unintelligible. I have hitherto called this kind of reasoning which issues in explanatory hypotheses and the like, abduction, because I see reason to think that this is what Aristotle intended to denote by the corresponding Greek term ‘apagoge’ in the 25th chapter of the 2nd Book of his Analytics. But since this, after all, is only conjectural, I have on reflexion decided to give this kind of reasoning the name of retroduction to imply that it turns back and leads from the consequent of an admitted consequence, to its antecedent. Observe, if you please, the difference of meaning between a consequent, the thing led to, and a consequence, the general fact by virtue of which a given antecedent leads to a certain consequent. And in language testing, the validity method is the same: it involves the successful elimination of alternative explanations of the facts. In order to validity investigation a number of criteria have been established by which we might decide which is the most satisfying explanation of the facts: Simplicity, otherwise known as Ockham’s Razor, which states: ‘Pluralitas non est ponenda sine necessitate’, translated as: ‘Do not multiply entities unnecessarily.’ In practice this means: the least complicated explanation of the facts is to be preferred, which means the argument that needs the fewest causal links, the fewest claims about things existing that we cannot investigate directly, and that does not require us to speculate well beyond the evidence available. Coherence, or the principle that we prefer an argument that is more in keeping with what we already know. Testability, so that the preferred argument would allow us to make predictions about future actions, behaviour, or relationships between variables, that we could investigate. Comprehensiveness,which urges us to prefer the argument that takes account of the most facts and leaves as little unexplained as possible.